Using Attribute Behavior Diversity to Build Accurate Decision Tree Committees for microarray Data
نویسندگان
چکیده
DNA microarrays (gene chips), frequently used in biological and medical studies, measure the expressions of thousands of genes per sample. Using microarray data to build accurate classifiers for diseases is an important task. This paper introduces an algorithm, called Committee of Decision Trees by Attribute Behavior Diversity (CABD), to build highly accurate ensembles of decision trees for such data. Since a committee's accuracy is greatly influenced by the diversity among its member classifiers, CABD uses two new ideas to "optimize" that diversity, namely (1) the concept of attribute behavior-based similarity between attributes, and (2) the concept of attribute usage diversity among trees. The ideas are effective for microarray data, since such data have many features and behavior similarity between genes can be high. Experiments on microarray data for six cancers show that CABD outperforms previous ensemble methods significantly and outperforms SVM, and show that the diversified features used by CABD's decision tree committee can be used to improve performance of other classifiers such as SVM. CABD has potential for other high-dimensional data, and its ideas may apply to ensembles of other classifier types.
منابع مشابه
Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...
متن کاملKnowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...
متن کاملIntegrating boosting and stochastic attribute selection committees for further improving the performance of decision tree learning
Techniques for constructing classiier committees including Boosting and Bagging have demonstrated great success, especially Boosting for decision tree learning. This type of technique generates several classiiers to form a committee by repeated application of a single base learning algorithm. The committee members vote to decide the nal classiication. Boosting and Bagging create diierent classi...
متن کاملGenerating Classifier Commitees by Stochastically Selecting both Attributes and Training Examples
Boosting and Bagging, as two representative approaches to learning classiier committees, have demonstrated great success, especially for decision tree learning. They repeatedly build diierent classiiers using a base learning algorithm by changing the distribution of the training set. Sasc, as a diierent type of committee learning method, can also signiicantly reduce the error rate of decision t...
متن کاملCreating diversity in ensembles using artificial data
The diversity of an ensemble of classifiers is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, Decorate (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of bioinformatics and computational biology
دوره 10 4 شماره
صفحات -
تاریخ انتشار 2012